Generation of Pattern - Matching Algorithms by Extended Regular Expressions
نویسنده
چکیده
It is dicult to express the denition of the comments of C language in a regular expression. However, the denition can be expressed by a simple regular expression by introducing a special symbol, called the any-symbol, that represents any single character, or by introducing a kind of negation symbol into regular expressions. In general, the problem of string pattern matching can be expressed as such an extended regular expression, and the corresponding nite state automaton generated from the expression is equivalent to the Knuth-Morris-Pratt pattern-matching algorithm [4]. In particular, if we use the any-symbols, the pattern is not restricted to a string of characters. It can be any regular expression. Our method can also be applied to the problem of repeated pattern matching. The Aho-Corasick algorithm [3] can be derived mechanically from an extended regular expression that contains any-symbols. 1 Introduction The denition of a comment of C language can be given as a string composed of three strings in the following order: /* , a string that does not contain */ (may be an empty string), */. Expressing the above denition by a regular expression is complicated because it is dicult to express \a string that does not contain */." Here, */ means a string of length two composed of * followed by /. Although we can use the negation symbols in lex, a well-known scanner generator , the expression for the comment is still complex{for example, the repetition of the immediate predecessor zero or more times, and [^*/] expresses one character that is neither * nor /. This problem can be simply expressed if we can designate \a string that does not contain */" directly. If the expression of the above string is
منابع مشابه
Verified Decision Procedures for MSO on Words
Monadic second-order logic on finite words (MSO) is a decidable yet expressive logic into which many decision problems can be encoded. Since MSO formulas correspond to regular languages, equivalence of MSO formulas can be reduced to the equivalence of some regular structures (e.g. automata). This paper presents a verified functional decision procedure for MSO formulas that is not based on autom...
متن کاملHigh-speed String and Regular Expression Matching on FPGA
In recent FPGA researches, there has been much attention to dynamically reconfigurable algorithms that can modify their configuration on-the-fly. In this paper, we report recent progress on dynamically reconfigurable hardwares on FPGA for high-speed string and regular expression matching, which have been developed by our group since 2008. In particular, we describe the architecture, algorithms,...
متن کاملExtending Regular Expressions with Context Operators and Parse Extraction
Regular expressions are used in many applications to specify patterns because any regular expression can be compiled into a very efficient one-pass pattern matcher called a finite automaton. Finding matches is useful, but even more useful is parse extraction, which describes in detail how a pattern matches some input. After matching an address, for example, parse extraction makes it easy to fin...
متن کاملA Boyer-Moore (or Watson-Watson) Type Algorithm for Regular Tree Pattern Matching
In this paper, I outline a new algorithm for regular tree pattern matching. The Boyer-Moore family of string pattern matching algorithms are considered to be among the most e cient. The Boyer-Moore idea of a shift distance was generalized by Commentz-Walter for multiple keywords, and generalizations for regular expressions have also been found. The existence of a further generalization to tree ...
متن کاملA Multi-pattern Matching Algorithm Based on WM Algorithm
The research on the algorithms of pattern-matching is an important subject in the field of computer study. The algorithms can range from single-pattern matching and multipattern matching algorithms to extended characters matching and regular expression. Among the many multi-pattern matching algorithms, AC algorithm and WM algorithm would be the two most classical algorithms, but these two algor...
متن کامل